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Abstract 

We consider the compressed sensing problem, where the object xq € is to be 
recovered from incomplete measurements y = Axq + z; here the sensing matrix A is 
an n X A'' random matrix with iid Gaussian entries and n < N. A popular method 
of sparsity-promoting reconstruction is f^-penalized least-squares reconstruction (aka 
LASSO, Basis Pursuit). 

It is currently popular to consider the strict sparsity model, where the object xq is 
nonzero in only a small fraction of entries. In this paper, we instead consider the much 
more broadly applicable ^p-sparsity model, where xq is sparse in the sense of having £p 
norm bounded by ^ • N^^p for some fixed < p < 1 and ^ > 0. 

We study an asymptotic regime in which n and N both tend to infinity with limiting 
ratio n/N = 5 & (0, 1), both in the noisy {z ^ 0) and noiseless (z = 0) cases. Under weak 
assumptions on xq, we are able to precisely evaluate the worst-case asymptotic minimax 
mean-squared reconstruction error (AMSE) for penalized least-squares: min over 
penalization parameters, max over £p-sparse objects xq. We exhibit the asymptotically 
least-favorable object (hardest sparse signal to recover) and the maximin penalization. 

In the case where njN tends to zero slowly i.e. extreme undersampling - our 
formulas (normalized for comparison) say that the minimax AMSE of i\ penalized 
least-squares is asymptotic to ^ ■ ^ ^'"sW") . ^-^ _|_ q^^)) Thus we have not only 

the rate but also the constant factor on the AMSE; and the maximin penalty factor 
needed to attain this performance is also precisely specified. Other similarly precise 
calculations are showcased. 

Our explicit formulas unexpectedly involve quantities appearing classically in statis- 
tical decision theory. Occurring in the present setting, they reflect a deeper connection 
between penalized £^ minimization and scalar soft thresholding. This connection, which 
follows from earlier work of the authors and collaborators on the AMP iterative thresh- 
olding algorithm, is carefully explained. 

Our approach also gives precise results under weak-£p ball coefficient constraints, as 
we show here. 

Key Words: Approximate Message Passing. Lasso. Basis Pursuit. Minimax Risk over 
Nearly-Black Objects. Minimax Risk of Soft Thresholding. 
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1 Introduction 



In the compressed sensing problem, we are given a collection of noisy, linear measurements 
of an unknown vector xq 

y = Axo + z, (1.1) 

Here the measurement matrix A has dimensions n hy N, n < N, the A^-vector xq is the 
object we wish to recover and the noise z ~ N(0,(T^I). Both y and A are known, both xq 
and z are unknown, and we seek an approximation to xq. 

Since the equations are underdetermined and noisy, it seems hopeless to recover xq in 
general, but in compressed sensing one also assumes that the object is sparse. In a number 
of recent papers, the sparsity assumption is formalized by requiring xq to have at most k 
nonzero entries. This k-sparse model leads to a simpler analysis, but is highly idealized, 
and does not cover situations where a few dominant entries are scattered among many small 
but slightly nonzero entries. For such situations, |Don06aj proposed to measure sparsity by 
membership in £p balls < p < 1, namely to consider the situation where the £p-nor nQof 
Xq is bounded as 

llxo||^ = X^|xo,^<ive^ (1.2) 

i=l 

for some constraint parameter ^. Here, as p — >• 0, we recover the fe-sparse case (aka Iq 
constraint). 

Much more is known today about behavior of reconstruction algorithms under the k- 
sparse model than in the more realistic ip balls model. In some sense the /c-sparse model has 
been more amenable to precise analysis. In the noiseless setting, precise asymptotic formulas 
are now known for the sparsity level k at which ii minimization fails to correctly recover the 
object Xq |Don06bl IDT051 [DT10| . In the noisy setting, precise asymptotic formulas are now 
known for the worst-case asymptotic mean-squared error of reconstruction by £i-penalized 
£2 minimization [DMMToIIBMTT] . By comparison, existing results for the ip balls model are 
mainly qualitative estimates, i.e. bounds that capture the correct scaling with the problem 
dimensions but involve loose or unspecified multiplicative coefficients. We refer to Section 



10.2 for a brief overview of this line of work, and a comparison with our results. 

We believe our paper brings the state of knowledge about the ip-hall sparsity model to 
the same level of precision as for the A;-sparse model. We consider here the high-dimensional 
setting N,n ^ 00 with matrices A having iid Gaussian entries. We treat both the noisy 
and noiseless cases in a unified formalism and provide precise expressions, including con- 
stants, describing the worst-case large-system behavior of mean-squared error for optimally- 
tuned ^^-penalized reconstructions. Because our expressions are precise, they deserve close 
scrutiny; as we show here, this attention is rewarded with surprising insights, such as the 
equivalence of undersampling with adding additional noise. Less precise methods could not 
provide such insights. 

The rest of this introduction reviews the results obtained through our method. 



^Throughout this paper we will accept the abuse of terminology of calling || • ||p a 'norm', although it is 
not a norm for p < 1. 
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1.1 Problem formulation; Preview of Main Results 

Our main results concern £^-penalized least-squares reconstruction with penalization pa- 
rameter A. ^ 

XX = argmm|-||y - Ax\\l + A||x||i| . (1.3) 

This reconstruction rule became popular under the names of LASSO |Tib96j or Basis Pursuit 
DeNoising [CD95j. Our analysis involves a large-system limit, which was effectively also 
used in |DMM09| IDMMlOl IBMlT] , We introduce some convenient terminology: 

Definition 1.1. A problem instance In,N is a triple In,N = i^o^\ 

consisting 

of an object Xq^^ to recover, a noise vector z*^"), and a measurement matrix A. A sequence 
of instances S = {In,N) is an infinite sequence of such problem instances. 

At this level of generality, a sequence of instances is nearly arbitrary. We now make 
specific assumptions on the members of each triple. HEre and below I(V) is the indicator 
function on property V. 

Definition 1.2. • Object £p sparsity constraint. A sequence xq = (xH^^) belongs 
to Xp{^) if (i) A^~^ ||xq^'' lip < ^P, for amm M; and {ii) There exists a sequence B = 
{Bm}m>o such that Em ^ 0, and for every N, EiIi(4?)^II(l4?l > M) < BmN. 

• Noise power constraint. A sequence z = belongs to Z^{a) if n "'"||z*-"'||2 — ^ 

• Gaussian Measurement matrix. ^("■'^) ~ GAUSS(n, A^) is an n x N random 
matrix with entries drawn iid from the N(0, ^) distribution. 

• The Standard ip Problem Suite Sp{6, ^, a) is the collection of sequences of instances 

S = A("'^))}„,jv where 

(i) n/N 6, 

(ii) xo G Xp{C), 
(Hi) z E Z'^{a), and 

(iv) each ^("''^) is sampled from the Gaussian ensemble GAUSS(n, A^). 

The uniform intergrability condition X^^i(2;o^^)^I(|xq^''| > M) < B^N essentially 

requires that the £2 norm of x\^^ is not dominated by a small subset of entries. As we 
discuss below, it is a fairly weak condition and most likely can be removed because the 
least-favorable vectors xq turn out to have all non-zero entries of the same magnitude. 
Finally notice that uniform integrability is implied by following: there exist q > 2, B < 00 
such that Wx^^^W^q < NB for ah A^. 

The fraction 6 = n/N measures the incompleteness of the underlying systems of equa- 
tions, with 6 near 1 meaning n ^ N and so nearly complete sampling, and 6 near meaning 
n <^ N and so highly incomplete sampling. 

Note in particular: the estimand x and the noise z are deterministic sequences of objects, 
while the matrix A is random. In particular, while it may seem natural to pick the noise to 
be random, that is not necessary, and in fact plays no role in our results. 
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Also let AMSE(A; S) denote the asymptotic per-coordinate mean-squared error of the 
LASSO reconstruction with penalty parameter A, for the sequence of problem instances S : 

AMSE(A,S) = limsup^Ejll^-^^ -xj^^f } . (1.4) 

Here a?^^^ denotes the LASSO estimator, and x'^^ the estimand, on problem instances of 
siz^ A^. Moreover the limsup is taken as n, A" — )• oo;n ~ 6N. Although in general this 
quantity need not be well defined, our results imply that, if the sequence of instances S is 
taken from the standard problem suite, this quantity is bounded. 

Now the AMSE depends on both A, the penalization parameter, and x, the sequence of 
objects to recover. As in traditional statistical decision theory, we may view the AMSE as 
the payoff function of a game against Nature, where Nature chooses the object sequence x 
and the researcher chooses the threshold parameter A. In this paper. Nature is allowed to 
pick only sparse objects Xq^^ obeying the constraint A^~^ ||xq^^ ||p < ^p. 

In the case of noiseless information, y = Axq (so z = 0), this game has a saddlepoint. 



and Theorem 4.1 gives a precise evaluation of the minimax AMSE: 



sup inf AMSE(A,S) = Jf . (1.5) 

The maximin on the left side is the payoff of a zero-sum game. 

The function on the right side, Mp{ ■ ) is displayed in Figure 1. It evaluates the minimax 
MSE in a classical and much discussed problem of statistical decision theory: soft threshold 
estimation of random means X satisfying the moment constraint E{|Ar|^} < from noisy 
data X -|- N(0, 1). This problem was studied in |D J94| . and detailed information is known 
about Mp] see Section [2] for a review. 

In the noisy case, c > 0, we have the same setup as before, only now the AMSE will of 



course be larger. Theorem 5.1 gives the minimax AMSE precisely: 



sup inf AMSE(A,S) = CT^ •m;(5,e/cT), (1.6) 

Se5p(<5,^,(T) ^ 

where m* = m*{6,^) is defined as the unique positive solution of the equation 

" M^fT^^^V (1.7) 



l + m/5 ^\{l + m/6)y^ 

Again, the precise formula involves Mp{ • ), a classical quantity in statistical decision theory. 
See Figure 8 for a display of the minimax AMSE as a function of p and ^. 

Our results include several other precise formulas; our approach is able to evaluate a 
number of operationally important quantities 

• The least-favorable object, ie. the sparse estimand xq which causes maximal difficulty 
for the LASSO; Eqs (4.4), (5.5), (6.6). 



^It would be more notationally correct to write ■r^^'"' since the full problem size involves both n and A'^, 
but we ordinarily have in mind a specific value S ~ n/N, hence n is not really free to vary independent of 
N. 
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• The maximin tuning, the actual choice of penahzation which minimizes the AMSE 
when Nature chooses the least-favorable distribution; Eqs (4.3), (5.6), (6.16). 

• Various operating characteristics, including the AMSE of reconstruction, and the 
limiting Ip norms of the reconstruction. 

Various figures and tables present precise calculations which one can make using the 
results of this paper. Figure 5 shows the Minimax AMSE as a function of (5 > 0, for the 
noiseless case z = with fixed ^ = 1, while Figure 8 gives the minimax AMSE as a function 
of ^ for fixed 5 = 1/4, for the noisy case where the mean-square value of 2 is cr^. 



1.2 Novel Interpretations 

Our precise formulas provide not only accurate numerical information, but also rather 
surprising insights. The appearance of the classical quantity Mp in these formulas tells 
us that a noiseless compressed sensing problem, with nonsquare sensing matrix A having 
n < is explicitly connected with the MSE in a very simple noisy problem where n = N, 



A is square - in fact, the identity(!) - cf. Eq. (1.5). On the other hand, a noisy compressed 



sensing problem with n < N and so A nonsquare is explicitly connected with a seemingly 
trivial problem, where n = N and A is the identity, but the noise level is different than in 



the compressed sensing problem - in fact higher- cf. Eqs. (1.6), (1.7). Conclusion: 



Slogan: In both the noisy and noiseless cases: under sampling is effectively 
equivalent to adding noise to complete observations^ 

While [DTDS06] and |LDSP08] formulate heuristics and provided empirical evidence about 
this connection, the results here (and in the companion papers [DMM091 IDMMIO] ) provide 
the only theoretical derivation of such a connection. 

Established research tools for understanding compressed sensing - for example estimates 
based on the restricted isometry property |CT05| ICRT06j - provide upper bounds on the 
mean square error but do not allow one to suspect that such striking connections hold. In 
fact we use a very different approach from the usual compressed sensing literature. Our 
methods join ideas from belief propagation message passing in information theory, and 
minimax decision theory in mathematical statistics. 



1.3 Complements and Extensions 
1.3.1 Weak £p 

Section 6 develops analogous results for compressed sensing in the weak-ip balls model, 
where the object obeys a weak-ip rather than an £p constraint. Weak-ip balls are relevant 
models for natural images and hence our results have applications in image reconstruction, 
as we describe in Section 9. 

^The formal equivalence of undersampling to simply adding noise is quite striking. It reminds us of ideas 
from the so-called comparison of experiments in traditional statistical decision theory. 
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1.3.2 Reformulation of £p Balls 

Our normalization of the error measure and of £p balls are somewhat different than what 
has been called the ip case in earlier literature. We also impose a tightness condition not 
present in earlier work. In exchange, we get precise results. For calibration of these results 
see Section 7. From the practical point of view of obtaining accurate predictions about the 
behavior of real systems, the present model has significant advantages. For more detail, see 
Section [Tol 



2 Minimax Mean Squared Error of Soft Thresholding 

Consider a signal xq G M^, and suppose that it satsifies xq satisfies the ^2-iiormalization 
A^^-'^IIxqIIp ~ 1 but also the £p-constraint ||xo||p < • for small ^ and < p < 2. To 
see that this is a sparsity constraint, note that a typical 'dense' sequence, such as an iid 
Gaussian sequence, cannot obey such a constraint for large N; in effect, smallness of ^ rules 
out sequences which have too many significantly nonzero values. 

If we observed such a sparse sequence in additive Gaussian noise y = xq + z, where 
^ ~Md N(0, 1), it is well-known that we could approximately recover the vector by simple 
thresholding - effectively, zeroing out the entries which are already close to zero. Consider 
the soft-thresholding nonlinearity tj : M. x — )• R. Given an observation y G M and a 
'threshold level' r € M-|_, soft thresholding acts on a scalar as follows 

{y -T if y > r, 
if-e<y<T, (2.1) 

y + T if y < — T. 

We apply it to a vector y coordinatewise and get the estimate x = r]{y;T). 

To analyze this procedure we can work in terms of scalar random variables. The empir- 
ical distribution of xq is defined as 

1 ^ 

^xo,N = J;^^S^o,^■ (2-2) 

i=l 

Define the random variables X ~ i'xo,N and Z ~ N(0, 1), with X and Z mutually indepen- 
dent. We have the isometry: 

N-^E\\x - XoWl = { + Z;t)-XY}. 

Hence, to analyze the behavior of thresholding under sparsity constraints, we can shift 
attention from sequences in to distributions. 

So define the class of 'sparse' probability distributions over M: 

MO^{'^^V{R) : H\Xn<e}, (2.3) 

where 7^(M) denotes the space of probability measures over the real line. Then xq satisfies 
the £p-constraint ||xo||p < iV • if and only if v^q G ^piO- 



The central quantity for our formulae (1.5), (1.6) is the minimax mean square error 
Mp(^) defined now: 
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Minimax MSE Mp(^), various p 




Figure 1: Minimax soft thresholding risk, Mp{(^), various p. Vertical axis: worst case 
MSE over Tp{^). Horizontal axis: Red, green, blue, aqua curves correspond to p = 
0.1,0.25,0.50,1.00. 



Definition 2.1. The minimax mean squared error of soft thresholding is defined by: 

Mp{i)= inf sup ¥.{[i]{X + Z-t) - Xf) , (2.4) 

where expectation on the right hand side is taken with respect to X ^ u and Z ~ N(0,1) 
mutually independent. 

This quantity has been carefully studied in jDJ94j , particularly in the asymptotic regime 
,^ — )• 0. Figure [T] displays its behavior as a function of ^ for several different values of p. 

The quantity ( |2.4| ) can be viewed as the value of a game against Nature, where the 
statistician chooses the threshold r. Nature chooses the distribution and the statistician 
pays Nature an amount equal to the MSE. We use the following notation for the MSE of 
soft thresholding, given a noise level o", a signal distribution v and a threshold level r: 

mse{a'^;v,T) = ¥.{[r]{X + a Z;t a) - Xf] , (2.5) 

where, again, expectation is with respect io X ^ v and Z ~ N(0, 1) independent. Hence 
the quantity on the right hand side of Eq. (2.4) -the game payoff- is just mse(l; v,t). 

Evaluating the supremum in Eq. (2.4) might at first appear hopeless. In reality the 
computation can be done rather explicitly using the following result. 

Lemma 2.1. The least- favorable distribution Vp^^, i-c. the distribution forcing attainment 
of the worst-case MSE, is supported on 3 points. Explicitly, consider the 3-point mixture 
distribution 

i.,^^ = {l-e)6o+'-5,+ '-S.^. (2.6) 
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Then the least-favorable distribution Up^^ is the 3-point mixture ^ep(S.),fj'p{i) f'^^ specific values 



In fact it seems the minimax problem in Eq. (2.4) has a saddlepoint, i.e. a pair 
(fp_5,rp(^)) G ViR) X M+, such that 

mse(l; i/p,^, r) > mse(l; i^p,(,Tp{(,)) > mse(l; u, rp(^)) Vr > 0, Vi/ G Jp(0 , (2.7) 

but we do not need or prove this fact here. The MSE is readily evaluated for 3-point 
distribution, yielding 

mse(l; z.^,/., r) = (l - e){2{l + t^^-t) - 2r0(r)} (2.8) 
+ + (1 + ^2 _ ^2)[$(_^ _r)+ $(/i - r)] + (^u - r),/.(/i + r) - (/i + r)(/>(-/i + r)} . 



Here and below, (f){z) = e ^ -v/27r is the standard Gaussian density and <l>(x) = (f){z) dz 
is the Gaussian distribution function. Further, it is easy to check that the MSE is maximized 



when the ip constraint is saturated, i.e. for 



e/xP = e^ (2.9) 



Therefore one is left with the task of maximizing the right-hand side of Eq. (2.8) with 
respect to e (for = ^e~^/^) and minimizing it with respect to r. This can be done quite 
easily numerically for any given ^ > 0, yielding the values of Tp(^), fJ-p{(,) and ep(^) plotted 
m Fig.[2j The minimax property is illustrated in Fig. [3j 
Important below will be the inverse function 

Mp-^ (m) = inf{^ e [0,oo) : Mp{C)>m}, (2.10) 

defined for m G (0,1), and depicted in Figure |4j The well-definedness of this function 
follows from the next Lemma. 

Lemma 2.2. The function ^ i— Mp(^) is continuous and strictly increasing for ^ G (0, oo), 
with limg_!.o Mp{^) = 0, and lim^_j.oo Afp(^) = 1. 

Proof Let mseo(/u, r) = E{[r/(^ + Z;t) - /x]^} for Z ~ N(0, 1), so that mse(l;r, z^) = 
J mseo(/i, t) z^(d/i). Since mseo(//, r) = mseo(— /U,r) in this formula we can assume without 
loss of generality that i^( • ) is supported on M+. 

To show strict monotonicity, fix ^ < let r' = Tp(^') be the minimax threshold for 
J-"p(^'), and let = z/p^g be the least favorable prior for -Fp(^). Let u' = S^i/^v^ be the 
measure in Tp{S,') obtained by scaling up by a factor ^'/? (explicitly, for a measurable set 
C, i^'{C) = /S)C)). Since / strict monotonicity of /x — )• mseo(jU,r) (e.g. |DJ94| 
eq. A2.8]) shows that mse(l;r',f^) < mse(l; r', i^'). Consequently 

Mp{i) < mse(l; r', i/^) < mse(l; r', u') < sup mse(l; r', u) = Mp{^'). 

We verify that t Mp{t^/P) is concave in t: combined with strict monotonicity, we can 
then conclude that Mp(^) is continuous. Indeed, the map u — )• mse(l; r, v) is linear in v and 
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Least favorable u (|), various p 

4.5| , , , , , , 




I" 

Figure 2: Least-favorable /x (upper frame) and corresponding minimax threshold r 
(lower frame). Horizontal axes: Red, green, blue, aqua curves correspond to 

p = 0.1,0.25,0.50,1.00. 
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p=0.10,5P=0.10,MSE=0.30,X=1.20n=3.56 



p=0.25,r=0.10,MSE=0.26,X=1.28ii=3.14 




3.2 3.4 3.6 3.8 




p=0.50,5P=0.1 0,MSE=0.22,X=1 .41 n=2.77 



p=1 .00,gP=0. 1 0,MSE=0.1 5,X=1 .66iA=2.26 





Figure 3: Saddlepoint property of Minimax Tp(^), = 1/10, various p. Vertical Axis: 
MSE at Fs^n- Horizontal Axis /x. Vertical Blue line: least-favorable /x, /ip(C)- Horizontal 
Blue Line: Minimax MSE Mp(^). At each value of n, Black curve displays corresponding 
MSE of soft thresholding with threshold at the minimax threshold value Tp(^) , under the 
distribution F^^^ with efi'^ = The other two curves are for r 10 percent higher and 
10 percent lower than the minimax value. In each case, the black curve (associated with 
minimax r), stays below the horizontal line, while the red and blue curves cross above it, 
illustrating the saddlepoint relation. 
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Figure 4: Inverse function M~^{m). Horizontal axis: m, desired minimax mean square 
error m. Vertical axis: left-hand plot: | = M~^{m), the radius of ball that attains it. 
right-hand plot: log(^). Colored curves correspond to various choices of p. 



so mse*(z^) = inf,- mse(l; r, i/) is concave in u. Hence Mp{t^/P) = sup{mse*(i/) : < t} 

is also concave. 

That lim^_^o -^p(0 = is shown in |DJ94j . compare Lemma 2.3 below. For large ^, 
observe that 

1 > MpiO > MpiO = inf sup E{[r]{X + Z) - X]^} , 

the minimax risk over all estimators r]. Further A^p(,^) > A^oolOi the minimax risk for 
estimation subject to the bounded mean constraint |^| < ^. That 7Woo(0 — 1 is shown, 
for example, in |DLM9n[ Eq. (2.6)]. □ 

Of particular interest is the case of extremely sparse signals, which corresponds to the 
limit of small ^. This regime was studied in detail in [DJ94j whose results we summarize 
below. 



Lemma 2.3 ( |DJ94| ). As ^ — >• the minimax pair ('^ep(g),/.ip(5)5 ^^(0) ^Q- {'^■4^ obeys 

TpiO = x/21og(l/eP)- {1 + 0(1)}, 
f,p{0 = x/21og(l/eP) -{1 + 0(1)}, 

Further, the minimax mean square error is given, in the same limit, by 

MpiO = (21og(l/eP))i-^'/2eP • {1 + 0(1)} . (2.11) 

The asymptotics for Mp(^) in the last lemma imply the following behavior of the inverse 
function as m — )• 0: 

Mp-^(m) = (21og(l/m))^^^ ^^^m^/f • {1 + o(l)} . (2.12) 
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3 The asymptotic LASSO risk 



In this section we discuss the high- dimensional hmit of the LASSO mean square error for 
a given sequence of instances S = {In,N)- Our treatment is mainly a summary of results 
proved in [BMlOj and |DMM10] . adapted to the current context. 

3.1 Convergent Sequences, and their AMSE 

We introduced the notion of sequence of instances as a very general, almost structure-free 
notion; but certain special sequences play a distinguished role. 

Definition 3.1. Convergent sequence of problem instances. The sequence of problem 
instances S = {{xll^\ z^"\ A^"-'^'>)}n,N is said to be a convergent sequence ifn/N — )• 5 G 
(0,oo), and in addition the following conditions hold: 

(a) Convergence of object marginals. The empirical distribution of the entries of 
Xq^^ converges weakly to a probability measure u onM. with bounded second moment. 
Further iV~i4^^||i ^ E^X"^. 

(b) Convergence of noise marginals. The empirical distribution of the entries of 
z^"^ converges weakly to a probability measure u onM with bounded second moment. 
Further ^ E^Z^ = cj^. 

(c) Normalization of Matrix Columns. //{ej}i<j<7v, S '^'^ denotes the standard 
basis, then maxjgj/yr] || A^"'^-'ej||2, miujgjjv] ||^^"'^'*ei||2 — 1, as N ^ oo where [N] = 
{1,2,. ..,N}. 

We shall say that S is a convergent sequence of problem instances, and will write S G 
CS[5,u,uj,a) to make explicit the limit objects. 

Next we need to introduce or recall some notations. The mean square error for scalar 



soft thresholding was already introduced in the previous Section, cf. Eq. (2.5), and denoted 
by mse(cj^; z/, r). The second is the following state evolution map 

^'(m; 5, a, u, r) = mse^o"^ + -m; u, , (3-1) 

This is the mean square error for soft thresholding, when the noise variance is cr^ + m/6. 
The addition of the last term reflects the increase of 'effective noise' in compressed sensing 
as compared to simple denoising, due to the undersampling. In order to have a shorthand 
for the latter, we define noise plus interference to be 

npi(m;5,cj) = + . (3.2) 



Whenever the arguments 5, a, v, r will be clear from the context in the above functions, we 
will drop them and write, with an abuse of notation ^{m) and npi(m). 

Finally, we need to introduce the following calibration relation. Given r G M+, let m^{T) 
to be the largest positive solution of the fixed point equation 

m = "^{m;5,a,v,T) (3-3) 
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(3.4) 



(of course depends on b^a^v as well but we'll drop this dependence unless necessary). 
Such a solution is finite for all r > tq for some tq = to((5). The corresponding LASSO 
parameter is then given by 

1 - + v^^l > ry^} 

with npi^ = n pi (m*(r)). As shown in [BMlOj . r i— >■ A(r) establishes a bijection between 
A G (0, oo) and r G (ri, oo) for some t\ = Ti{S) > to{6). 

The basic high-dimensional limit result can be stated as follows. 

Theorem 3.1. Let S = {In,N} = {(^^o^^ -2^"^ ^^'^'^•*)}n,Af be a convergent sequence of 
problem instances, S G CS{5^a^i>,uj), and assume also that the matrices ^4^"'^^ are sampled 
from GAUSS(n, A^). Denote by x^^^ the LASSO estimator for instance In,N, A > and 
let : X ^ be a locally- Lipschitz function with \ip{xi,X2)\ < C(l + + x^) for all 
xi,X2 G M. 

Then, almost surely 

1 ^ 

= + \/npU^;T*\/npi^),^)} , (3.5) 

1=1 

where npi^ = npi(m=K), Z ~ N(0, 1) is independent of X ^ v, t^, = t^,{X) is given by the 
calibration relation described above, and is the largest positive solution of the fixed point 
equation m = ^(m, 6, cr, z^, r*). 

3.2 Discussion and further properties 

In the next pages we will repeatedly use the shorthand HFP(^') to denote the largest 
positive solution of the fixed point equation m = 'if{m;6,a,h',T), where we may suppress 
the secondary parameters (SjajV^r) and simply write ^(m). Formally 

HFP(^') = sup{m > : ^'(m) > m}. (3.6) 

In order to emphasize the role of parameters 6, a, v, t, we may also write 
IIFP(^'( • ; 5, (T, ly, r)). We recall some basic properties of the mapping "if. 

Lemma 3.1 ( |DMM09l IDMMIO] ) . For fixed 6,a,iy,T, the mapping m i— )• ^{m) defined on 
[0, oo) is continuous, strictly increasing and concave. Further ^'(0) > with ^'(0) = if 
and only if a = 0. Finally, there exists tq = to{5) such that limm^oo ^'(m) < 1 if and only 

ifT> Tq. 

By specializing Theorem 3.1 to the case ip{xi,X2) = (xi — ^2)^ and using the fixed point 



condition m* = ^'(m*; 5, a, v, r*) we obtain immediately the following. 

Corollary 3.1. Let S G CS{6,a,i',uj) be a convergent sequence of problem instances, and 
further assume that ~ GAUSS(n,A). Denote by x^^^ the LASSO estimator for 

problem instance In,N, with A > . Then, almost surely 

I 

lim — ||xa - X0II2 = "1=^ , (3.7) 

N^oo iV 



where = HFP(^'( ■ ;6,a, v, r*)), and r* = r*(A) is fixed by the calibration relation (3.4-)- 



13 



3.3 AMSE over General Sequences 



Corollary 3.1 determines the asymptotic mean square error for convergent sequences S G 
CS{6, a, v). The resulting expression depends on 5, a, z/, and is denoted AMSE5£;(A; 5, cr, v). 
We have 

AMSEsi?(A; 5, a, v) = HFP(^'( • ; J, a, v, r,)). (3.8) 

The introduction considered instead the asymptotic mean square error AMSE(A; S) along 
general, not necessarily convergent sequences of problem instances in the standard (.p prob- 



lem suite S € Sp{5,^,a), cf. Eq. (1.4). Given a sequence S G 5^(5,^,0"), we let 

AMSE(A;S) = lim sup ^^{Wxf^ - 4^^f } ■ (3.9) 
Below we will often omit the subscript SE on AMSEse, thereby using the same notation 



for the state evolution quantity (3.8 ) and the sequence quantity (3.9 ). This abuse is justified 
by the following key fact. The asymptotic mean square error along any sequence of instances 
can be represented by the formula AMSE5£;(A; 5, u, a), for a suitable - provided the sensing 
matrices A^"'^) have i.i.d. Gaussian entries. Before stating this result formally, we recall 



that the definition of sparsity class -7>(0 was given in Eq. (2.3). 



Proposition 3.1. Let S be any (not necessarily convergent) sequence of problem instances 
in Sp{6,S,,a). Then there exists a probability distribution v E -7>(C) such that 

AMSE(A; S) = AMSEs£;(A; 5, a), (3.10) 

and both sides are given by the fixed point of the one- dimensional map ^, namely 
HFP(^'( ■ ■,6,a,u,n)). Further, for each e > 0, 

lim sup P| 4 PI^^ -4^^112 > AMSE5b(A;5,i/,(t) + e| =0. (3.11) 

Conversely, for any v £ J^piO^ there exists a sequence of instances S G Sp{6,(,,a), such 
that AMSE(A; S) = AMSE(A; 6, u, a) along that sequence. 

Proof. Given the sequence of problem instances, S = {x\^\ z^'^\ A*^"'^)}„^jv, extract a sub- 
sequence along which the expected mean square error has a limit equal to the lim sup in 



Eq. (1.4). We will then extract a further subsequence that is a convergent subsequence of 



problem instances, in t he s ense of Definition 3.1, hence proving the direct part of our claim, 
by virtue of Corollary 3.1 (Convergence of the expectation of \\x~^'^ — Xq^^|P/A follows 
from almost sure convergence together with the fact that ||rEQ^^|p/A is uniformly bounded 
by assumption and ||4^^|p/A is uniformly bounded by Lemma 3.3 in |BM10j .) 



Let VxfuN be the empirical distribution of Xq^^ as in (2.2 ). Since S G Sp{5, ^, a), we have 
T^xo,Ni\XY') < hence the family {i'xo,n} is tight, and along a further subsequence the 
empirical distributions of Xq^'* converge weakly, to a limit u, say. Again by S G Sp{5,^,a), 
the empirical distributions of are tight (assumption z G Z'^{(y) entails ||z''"^|p/n — )• o"^); 
we extract yet another subsequence along which they converge, to w, say. 
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We are left with a subsequence we shall label {{rik, Nk)}k>i- We wish to prove for 



this sequence (a)-(c) of Definition 3.1 Property (c) in Definition 3.1 the convergence 
of column norms, is well known to hold for random matrices with iid Gaussian entries 
(and easy to show). We are left to show (a) and (b), i.e. that i'xo,N,^{X'^) ^{^'^) ^-^d 
Uz^n^iX'^) — )• uj{X'^) along this sequence. Convergence of the second moments follows since 



= J^{X\x\<M}) +errM 

where we used the dominated convergence theore, where, by the uniform integr ability prop- 
erty of sequences x in Xp{^), err a/ < iO as M — )■ oo. 



The limit in probability (3.11) follows by very similar arguments and we omit it here. 



The converse is proved by taking to be a vector with iid components x^^^ ~ u. 
The empirical distributions i^m then converge almost surely to by the Glivenko-Cantelli 
theorem. Convergence of second moments follows from the strong law of large numbers. □ 

3.4 Intuition and relation to AMP algorithm 



Theorem |3.1| implies that, in the high-dimensional limit, vector estimation through the 
LASSO can be effectively understood in terms of N uncoupled scalar estimation problems, 
provided the noise is augmented by an undersampling-dependent increment. A natural ques- 
tion is whether one can construct, starting from the vector of measurements y = (yi, . . . , y„) 
(which are intrinsicaly 'joint' measurements of xi, . . . , xn), a collection of N uncoupled mea- 
surements of xi, . . . , Xat. 



A deeper intuition about this question and Theorem 3.1 can be developed by considering 
the approximate message passing (AMP) algorithm first introduced in |DMM09j. At one 
given problem instance (i.e. frozen choice of {n,N)) we omit the superscript {N). The 
algorithm produces a sequence of estimates . . . } in M^, by letting xP = and, 

for each t > 

= y-Ax'+^^z'-' (3.12) 
n 

= r](x' + A^z';et), (3.13) 

where ||x*||o is the size of the support of x^. Here {z^}t>o ^ M" is a sequence of residuals 
and 9t a sequence of thresholds. 

As shown in |BMllj . the vector x^ + A^z^ is distributed asymptotically (large t) as 
xq + tu* with iM* G R-^ a vector with i.i.d. components ~ N(0, cr^) independent of xq. 
(Here the convergence is to be understood in the sense of finite-dimensional marginals.) In 
other words, the vector x* + z^ produced by the AMP algorithm is effectively a vector of 
i.i.d. uncoupled observations of the signal xq- 

The second key point is that the AMP algorithm is tightly related to the LASSO. First 
of all, fixed points of AMP (for a fixed value of the threshold 6t = 9^) are minimizers of the 
LASSO cost function and viceversa, provided the 9^^ is calibrated with the regularization 
parameter A according to the following relation 

A = e..(l-^), (3.14) 
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with xx the LASSO minimizer or -equivalently- the AMP fixed point. Finally, |BM10j 
proved that (for Gaussian sensing matrices ^4), the AMP estimates do converge to the 
LASSO minimizer provided the sequence of thresholds is chosen according to the policy 

0t = rat, (3.15) 

for a suitable a > depending on A [BMlOl IDMM10| . Finally, the effect ive noise-plus- 
interference level at can be estimated in several ways, a simple one being = WztW^/n. 



4 Minimax MSE over ip Balls, Noiseless Case 

In this section we state results for the noiseless case, y = Axq, where A is n x N and xq 
obeys an ip constraint. As mentioned in the introduction, our results hold in the asymptotic 
regime where n/N — )• 5 E (0, 1). 



4.1 Main Result 

Let S = {{x'^\ z^^\ ^'•"''^^)}ra,7V be a sequence of noiseless problem instances (z*^") = 0: no 
noise is added to the measurements) with Gaussian sensing matrices A*^"'^-* ~ GAUSS(n, N). 
Define the minimax LASSO mean square error as 

M;{6,C)= sup inf AMSE(A;S). (4.1) 

S65p (5,5,0) ^^*+ 

Theorem 4.1. Fix 6 G (0, 1), ^ > 0. The minimax AMSE obeys: 

M*(6,0 = ^ • (4.2) 

Further we have: 

MinirricLX Threshold. The minimax threshold A* (5,^) is given by the calibration relation 
l[3.4\ ) with T = T*((5, ^) determined as follows (notice in particular that this is independent 
ofV- 

T*{6,0 = rp{M-\S)). (4.3) 



Least Favorable v. The least-favorable distribution is a 3-point distribution u* = v* 



fe^fi* (cf Eq. (2.6)) with 



V 



Saddlepoint. The above quantities obey a saddlepoint relation. Put for short AMSE(A;z^) 
in place o/ AMSE(A; 5, 0), The minimax AMSE obeys 

Mp{5-i) = AMSE(A*;z^*) 

and 

AMSE(A*;i/*) < AMSE(A;z^*), VA > (4.5) 
> AMSE(A*;i/), Vi/ G Jp(0- (4.6) 
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Figure 5: Minimax MSE M*{6,1). We assume here ^ = 1; curves show log MSE as a 
function of 6. Consistent with 5 — t- asymptotic theory, the curves are nearly scaled copies 
of each other. 



4.2 Interpretation 

Figure [5] presents the function M*{6,^ = 1) on a logarithmic scale. As the reader can see, 
there is a substantial increase in the minimax risk as (5 — )• 0, which agrees with our intuitive 
picture that the reconstruction becomes less accurate for small S (high undersampling) . 

The asymptotic properties of M*{S,1) in the high undersampling regime {5 — t- 0) can 
be derived using Lemma 2.3 From Eq. (2.12) we have 

m;{6, 1) = <5i-2/f(21og(<5-i))2/P-i{l + osil)}, 6 ^ 0. 

Hence, when plotting log Af*((5, 1), as we do here, we should see graphs of the form 

logM;(<5, 1) = (1 - 2/p) ■ [log(<5-i) - log(log(r^)) - log(2)] + 0^(1), 6 ^ 0. 

In particular the curves should look 'all the same' at small 6, except for scaling; this is 
qualitatively consistent with Fig[5j even at larger 5. 

Another useful prediction can be obtained by working out the asymptotics of the mini- 
max threshold A* ((5,.^). Using Eq. (4.3) as well as the calibration relation (3.4), we get, as 
6^0, 



21og(l/5)\i/p 



{1 + 05(1)}. 



(4.7) 
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4.3 Proof of Theorem 14.11 



We will focus on proving Eq. (4.2), since the other points follow straightforwardly. By 
Proposition 3.1, we have the equivalent characterization 

M*{6,C)= sup inf AMSE(A; (5, 1/, (J = 0) . (4.^ 



Further, by Corollary 3.1 we can use the mean square error expression given there, and be- 
cause of the monotone nature of the calibration relation, we can minimize over the threshold 
r instead of A. We get therefore 

M*{6,C)= sup inf AMSEsE(r;(5,zv,0), (4.9) 

where 

AMSEse(t;(^, i/,0) = m, 

m = mse[m/6;u,T) . (4-10) 

Recall that T'(M) denotes the class of all probability distribution functions on M. Define 
the scaling operator Sa : 7^(M) V{R) by {Say){B) = u{B/a) for any Borel set B. For the 
family of operators {5a : a > 0} we have the group properties 

Sa ■ Sb = Sa-b, 'S'l = I, SaSa~i = Si. (4-11) 

In particular by the last property, for any a > 0, the operator Sa ■ 'P(M) i— )• 'P(]R) is 
one-to-one. 

With this notation, we have the scale covariance property of the soft-thresholding mean 
square error 

mse(cr^; t) = a'^ ■ mse(l; Si/^u, r), (4.12) 
transforming a general-noise-level problem into a noise-level-one problem. As a consequence 



of Lemma 3.1 , the map a i— )• mse((T , u, r) is (for fixed u, r) increasing and concave. There- 



fore, the map cr^ i— )• mse(l; Si/^-z^, r) is strictly monotone decreasing. Also, the fixed point 
Eq. (4.10) can be rewritten as 



6 = mse{l;S^u,T), (4.13) 

where the solution is unique by strict monotonicity of m i— )• mse(l; S^jjj^i^jt). 

We will prove Eq. (4.2) by obtaining an upper and a lower bound for Mp{S,(,). In the 
following we assume without loss of generality that the infimum in Eq. (4.8) is achieved 

Mj,((^,0 = AMSEse(t,;i/„0). (4.14) 



Further we will use the minimax conditions for soft thresholding, see Lemma 2.1 



inf mse(l;i/,r) <Mp(0, y ly € Tp{0 , (4.15) 
sup mse(l; I/, r) > Mp(0 , VTeM+. (4.16) 
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Figure 6: Illustration of the minimax fixed point property. 



Horizontal input MSE m. 

Red 



Vertical: output MSE ^'(m) for the state evolution map defined as per Eq. (3.1) 
diagonal: ^(m) = m. Black vertical Line: minimax HEP Afp(^). Blue horizontal line: 
minimax MSE Mp{m). Black curve: MSE map at minimax threshold value and least- 
favorable distribution. It crosses the diagonal at the minimax fixed point. Colored Curves. 
MSE maps at minimax threshold value and other three-point distributions. All other fixed 
points occur below Mp(^). 
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Figure 7: Comparisons of highest fixed point at power law distribution with minimax HFP. 
Horizontal input MSE m. Vertical: output MSE ^{m) for the state evolution map defined 



as per Eq. (3.1). Red diagonal: ^'(m) = m. Red vertical: Minimax MSE. Black curve: MSE 
map at minimax threshold value and least-favorable distribution. It crosses the diagonal at 
the minimax fixed point. Green Curve: MSE map with same threshold, taken at power law 
distribution calibrated to same E|X|^' = constraint. 
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Uppe r bound on Mp{6,0- Let = Mp{6,^) = AMSEsE(r=,; i/*, 0). By Eq. ( |4.10p and 
(4.13) we have 



5 = mse(l; S 



-i^*, T*) = inf mse(l; S 



(4.17) 



The second equality follows because otherwise by there would exist t^^ with mse(l; S ^ ^/^ ^* ■, t* 
whence, by the monotonicity of m i— )• mse(l; 'S'^^/^y^^'*, t"**) it would follow that AMSEsb(t**; z/*, 

AMSEse(t*; z^*, 0) which violates the minimax property (4.14)^ 

Next notice that S" /tt—i'* G J-"p ( a/5/ m^, ^ ) whence by Eq. (4.15 ), we get 5 < Mpi^sJ bjm^ ^) 



By the monotonicity of ^ i— )• Mp(^) this yields 



Mp-^(5)2 



(4.18) 



(4.19) 



Lower bound on Mp((5, .^). Again by Eq. (4.10) and (4.13) we have 

5 = mse(l;S' /^y^z^=,,n) = sup mse(l; S" /^i/, n(zy)) , 

with r=K(z/) the optimal threshold for distribution v and the second equality following by an 
argument similar to the one above (i.e. if this weren't true, there would be a different worst 
distribution v^^^ reaching contradiction). But v G J^piO implies S ^ ^j^ ^ G ^p{^/^/^* Oi 
whence 



5= sup mse(l; t^{v)) > Mp{\/ bjm^ ^) 

i^GJp{5\/5/m*) 



(4.20) 



where the second inequality follows by Eq. (4.16). The proof is finished by using again the 
monotonicity of i— )• Mp(f^. □ 



5 Minimax MSE over Balls, Noisy Case 

In this section we generalize the results of the previous section to the case of noisy mea- 
surements with noise variance per coordinate equal to . 

5.1 Main Result 

Now let fj > and consider sequences S of noisy problem instances from the standard ip 

problem suite S G 5p((5, ^,cr); hence, in addition to the £p constraint ||xq^''||p < A^^^ and 
each ^("'^) ~ GAUSS(n,A^), now the noise vectors z^^^ G M" are non- vanishing and have 
norms satisfying Hz^^^p/n — )• cr^ > 0. 

We define the minimax LASSO asymptotic mean square error as 

M;{S,^,a)= sup inf AMSE(A;S). (5.1) 

Se-Sp(5,c,<7)^ei8+ 

By simple scaling of the problem we have, for any a > 0, 

M;{5,i,cj) = cj^M;{5,i/a,l), (5.2) 
an observation which will be used repeatedly in the following. 
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Theorem 5.1. For any 5, ^ > 0, let m* = m*(5, ^) be the unique positive solution of 



1 + m* /o 



Then the LASSO minimax mean square error M*is given by: 

M;{5,i,a) = a^-m*{5,i/a). 
Further, denoting by ^* = (1 + m* / 5)^'^^'^S,/ cr , we have: 

Least Favorable u. The least-favorable distribution is a 3-point mixture v* 
fe*,!!* (cf Eq. (2^ ) with 

f,*i6, e, a) = a • (1 + m75)i/Vp(r), e*{6, a) 



(5.3) 



(5.4) 



{i^*y 



(5.5) 



with m* = m*{5,£,/a) given by the solution of Eq. (5.3) 

MinirricLX Threshold. The minimax threshold A* [6, ^, a) is given by the calibration relation 
(3.4) with T = T*{6,(,,a) determined as follows: 

T*{S,^,a) = Tp{C). (5.6) 

with Tp{ ■ ) the soft thresholding minimax threshold, ^* (1 + m* /6)-^^'^^/a and V = V* is 
the least favorable distribution given above. 

Saddlepoint. The above quantities obey a saddlepoint relation. Put for short AMSE(A; z^) 
in place 0/ AMSE(A; 5, z^, cj). The minimax AMSE obeys 



and 



Mp{5,i,a) = AMSE(A*;i/*) 



AMSE(A*;i/*) < AMSE(A;z^*), 
> AMSE(A*;z^), 



VA > 



(5.7) 
(5.8) 



5.2 Interpretation 



Figure [8] provides a concrete illustration of Theorem |5.1[ For various sparsity levels 
and undersampling factors 5, the mean square error Mp{5,^,a) can be easily computed. As 
expected, the result is monotone increasing in ^ and decreasing in 6. For a given target mean 
square error, such plots allow to determine the required number of linear measurements. 

Equations (5.3) and (5.4) are somewhat more complex that their noiseless counterpart. 
For this reason, it is instructive to work out the o" — )• limit M*[5, ^, a). By the basic scaling 
relation (5.4), this is equivalent to computing the ^ — 00 limit of M*{1,5,£0 = m*{5,S). 
Considering Eq. (5.3), it is easy to show that, for large ^ 
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Figure 8: Minimax MSE Af* ((5,^,1), noisy case a = 1. We assume here 5 = 1/4; curves 
show MSE as a function of ^. 



Substituting in Eq. (5.3) 



6 1 + 



whence expanding for large ^ 
. 5^ 



^1/2 



-1/2N 



+ O(r') = M,((Vco)^/') - — ^ m;((5/co)^/') (ci + <5) + O(r') 



Imposing each order to vanish we get 

co(<5) 



5 



2V^ 



ML{{6/com 



Our calculations can be summarized as follows. 



(5.9) 
(5.10) 



Corollary 5.1. Fix a radius parameter ^. As a'^ ^ 0, the asymptotic LASSO minimax 
mean square error behaves as 

M;{6, e, a) = e co{6) + c,{6) + 0{cj^/e) , (5.11) 

with Co and ci determined by Eqs. (5.9) and (5. In particular, in the high undersampling 
regime 6^0, we get 



c,{S) = -{l + oi(5)}. 
P 



(5.12) 
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The derivation of the asymptotic behavior (5.12) is a straightforward calculus exercise, 
using Lemma |2.3[ 



The last Corollary shows that the noiseless case, cf. Theorem 4.1 and Eq. (4.2), is 
recovered as a special case of the noisy case treated in this section. Further leading correc- 
tions due to small noise cr^ <^ are explicitly described by the coefficient ci{6) given in 
Eq. dSTOl ). 

An alternative asymptotic of interest consists in fixing the noise level a, and letting 

(5.13) 



^/(J ^ 0. In this regime the solution of Eq. (5.3) yields, using Lemma 2.3 

m*{5, = (21og(l/eP))i-*'/2c^' • {1 + 0(1)} . 
Substituting this expression in Theorem |5.1[ we obtain the following. 



Corollary 5.2. Fix a noise parameter o"^ > 0. As ^ 0, the asymptotic LASSO minimax 
mean square error behaves as 



M;{5,i,a) = a-'-^e ■ {2 log ((a/0^) • {1 + o(l)} 
Further the minimax threshold value is given, in this limit, by 

X* = a-J2log{{a/0^){l + oil)}. 



(5.14) 



(5.15) 



5.3 Proof of Theorem O 

The argument is structurally similar to the noiseless case. We will focus again on proving 
the asymptotic expression for minimax error given in Eq. (5.4), since the other points of 
the theorem follow easily. Using Proposition 3.1 and Corollary 3.1, the asymptotic mean 
square error can be replaced by the expression given there and the minimization over A can 
be replaced by a minimization over r: 



M*{6,ta)= sup inf AMSEse(t;(5,i/,c7) , 



(5.16) 



where 

AMSEsE(r;5, iy,a) = m, 

m = mse[a'^ + m/6;u,T) . (5.17) 

By virtue of the scaling relation (5.2), we can focus on the case = 1. Define, for all 

m < 6 



n(m) = (l + m/(5)-^/^ 



We then have, applying Eq. (5.17) for the case o" = 1, 

m 



l + m/6 



mse(l;S'n(^)Z^, r) 



(5.18) 



(5.19) 



Notice that m i— )• m/(l + m/6) is monotone increasing, and m i— )• mse(l; 5'n(m)i^i ''") is 
monotone decreasing (because i— )• mse[l; Si/ai^,T) is decreasing as mentioned in the 
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previous section). Hence this equation has a unique non- negative solution provided 5 > 
mse(l; 6, r), which happens for ah r > to{6). 

Assume without foss of generahty that the minimax risk is achieved by the pair (r* , i^* ) . 
Then 



M;((^,e,l) = AMSEsE(n;5,i^*,l) =m* 



(5.20) 



Then satisfies Eq. (5.19) with r = r* and u = i/^. 
Upper bound on Mp{6,$,, 1). By the last remarks, we have 

— — ^ = mse(l;5n(m.)Z^*,n) = inf mse(l; S'n(„i^)^'*, r) 



(5.21) 



The second equality follows from Eq. (5.20). Indeed if the equality did not hold, we could 
find G M+ such that mse(l; 5n(m,)Z^*, t**) < mse(l; 5n(m^)Z^*, r*). But by the mono- 
tonicity of m i— )• m/(l + m/5) and of m i— mse(l; S'n(m)^'*, ''"**), this would mean that 
the corresponding fixed point is strictly smaller than m^f. This would contradict the 
minimax assumption. 

Since S^i^xv^ € Tp(y\(m,^^^ we can now apply Eq. (4.15), getting 



1 + 



< Mp(n(m,)e) . 



(5.22) 



Again by monotonicity of m i— )• m/(l + mjS) and of ^ i— t- Mp(^), this means that m=K is 
upper bounded by the solution of Eq. (|5.3|) . 



Lower bound on Afp(l,(5, ^). Applying again Eq. (5.19) and an analogous argument as 
above, we have 



1 I / JT 



sup mse(l;5'n(m.)Z^,T-*(S'n(m,)Z^)) • 



(5.23) 



In the last expression T*(5n(m^)Z^) is the optimal (minimal MSE) threshold for distribu- 
tion S'n{m.)Z^- For V G T.p{i), Sn(m,)'^ ^ -7^p(n(m^,)^). Further the map : J^piO 
J^p(n(m=i,)^) is bijective. We thus have 



sup mse(l; I/, r=K(zv)) . 



(5.24) 



By Eq. (4.16), we thus have 



1 + 



< Mp(n(m,)e) 



(5.25) 



which implies that m=K is upper bounded by the solution of Eq. (5.3). This finishes our 
proof. □ 
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6 Weak p-th. Moment Constraints 



Our results for Ip constraints iiave natural counterparts for weak ip constraints. We recall 
a standard definition for the weak-ip quasi- norm . For a vector x e M^, let T^it) = 

{i E {1,...,A^} : \xi\ > t} index the entries of x with amplitude above threshold t. 
Denoting by |5| the cardinality of set S, we define 

\\x\\^i^ = m&x[t\T,{t)\^/P] , (6.1) 

By Markov's inequality ||ic||iu£p < II^IIp- the weak ip quasi-norm is indeed weaker than the ip 
norm (quasi norm, if p < 1). Weak-ip norms arise frequently in applied harmonic analysis, 
as we discuss below. 

As the reader no doubt expects, we can define a weak ip analogue to the ip case. 

Definition 6.1. • Weak ip constraint. A sequence xq = (xj,^^) belongs to if 
(i) ll^^o^^ll^^^ < N^P, for all N; and (ii) there exists a sequence B = {Bm}m>o such 
that Bm 0, and for every N, EiIi(4!I^)^I(l4!I^I > M) < BmN. 

• Standard Weak-ip Problem Suite. Let Sp{5,(,,a) denote the class of sequences of 

problem instances In^N = {xq^\ z^''^\ A^"^'^^) built from objects in weak ip,- in detail: 

(i) n/N S; 

(ii) xo G A'-(e); 
(Hi) z G 22(a), and 

(iv) G GAUSS(n,iV). 



6.1 Scalar Minimax Thresholding under Weak p-th Moment Constraints 

The class of probability distributions corresponding to instances in the weak-^p problem 
suite is 

= (z^ G P(M) : suptP • v{{\X\ > t}) < ■ (6.2) 

In particular, given a sequence xq G Xp{£^) , the empirical distribution of each Xq^^ is in 

As in section [2| we denote by mse{a^; z^, r) the mean square error of scalar soft thresh- 
olding for a given signal distribution v. 

Definition 6.2. The minimax mean squared error under the weak p-th moment con- 
straint is 

M-(0= inf sup E{[r/(X + Z;T)-X]'}, (6.3) 

where the expectation on the right hand side is taken with respect to X ^ u and Z ~ N(0, 1), 
X and Z independent. 
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The collection of probability measures J^p{£,) has a distinguished element - a most 
dispersed one. In fact define the envelope function 

i/p,^(t)= mi M{\X\<t})- (6.4) 

the envelope of achievable dispersion of the probability mass for elements of J-'p{(,). This 
envelope can be computed explicitly, yielding 

Indeed it is clear by definition that < t}) > Hp^^{t). Further defining the CDF 

( j 2 + hHp{\x\) for x>0, 

- [ lHp{\x\) for x<0. ^^-^^ 

and letting fp^^ be the corresponding measure, we get for any t > 0, z/p^^({|X| < t}) = 
F^^^it) - F^^^(-t) = Hp^s^it). We therefore proved the following. 

Lemma 6.1. The most dispersed symmetric probability measure in J-p{^) is z/p^^. This 
distribution achieves the equality Vp^^{{\X\ < i}) = Hp^^{t) < z^({|X| < t}) for all v G 
J='^{0, and allt>0. 

It turns out that this most dispersed distribution is also the least favorable distribution 
for soft thresholding. In order to see this fact, define the function mseo : M x M_|_ — )• M by 
letting 

mseo(x; r) = E{ [r]{x + Z;t)- xf} , (6.7) 

whereby expectation is taken with respect to Z ~ N(0, 1). We then have the following useful 
calculus lemma (see for instance |DMM09] . 

Lemma 6.2. For each r G [0,oo), the mapping x i— J- mseo(x;T) is strictly monotone in- 
creasing in x £ [0,oo). 



Now the mean square error of scalar soft thresholding, cf. Eq. (2.5), is given by 

mse(l; z^, r) = Emseod^l; r) , (6-8) 

where expectation is taken with respect to X v. From the above remarks, we obtain 
immediately the following characterization of the minimax problem. 

Corollary 6.1 (Saddlepoint). Consider the game against Nature where the statistician 
chooses the threshold t, Nature chooses the distribution v € Fp{£,), and the statistician 
pays Nature an amount equal to the mean square error mse(l;r, i/). 
This game has a saddlepoint (Tp {(,) , Up ^) , i.e. a pair satisfying 

mse(l; r, u^^) > mse(l; t^{0, ^5) > mse(l; T^{i), v) Vr > 0, . (6.9) 

for all T > 0, and v G Tp{^. In particular, the least-favorable probability measure is 
ly^^ = i^p^, with distribution Fp^^ given in closed form by Eq. (6.6), and we have the following 
formula for the soft thresholding minimax risk: 

M^{C) = inf mse(l; r, i/p.) . (6.10) 

r>0 
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Figure 9: Minimax soft thresholding MSE over weak-£p baUs, Mp{^), for various p. Vertical 
axis: worst case MSE over J^p{^)- Horizontal axis: Red, green, blue, aqua curves (from 
bottom to top) correspond to p = 0.25, 0.50, 0.75, 1.00. 
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Ordinarily, identifying a saddlepoint requires search over two variables, namely the 
threshold r and the distribution i'. In the present problem we need only search over one 
scalar variable, i.e. r. We can further make explicit the MSE calculation, by noting that, 
by Eq. ([6^ 

mseo{x;T)x-P-'^ dx. (6.11) 



By a simple calculus exercise, this formula and Lemma 6.2 imply the following. 

Lemma 6.3. The function M^(^) is strictly monotone increasing in ^ ^ (0,oo). Hence, 
the inverse function 

(M-)-i(m)=inf{e: M;(0 > m}, 
is well-defined for m € (0, 1). 

The asymptotic behavior of M^(^) in the very sparse limit ^ — )• was derived in [Joh93| . 



Lemma 6.4 ( |Joh93| ). As ^ ^ 0, the minimax threshold level achieving Eq. (6.10) is given 
by 

r^ii) = V21og(l/^f) -{1+0(1)},, 
and the corresponding minimax mean square error behaves, in the same limit, as 

M;{0 = ^ (21og(l/^n)'-^/' • • {1 + o(l)} . (6.12) 
^ 2. — p 
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Minimax Threshold over Weak I balls, various p 
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Figure 10: Minimax soft threshold parameter, r^(^), various p. Vertical axis: minimax 
threshold over Horizontal Axis: Red, green, blue, aqua curves (from top to 

bottom) correspond io p = 0.25, 0.50, 0.75, 1.00. 



Comparing with Lemma 2.3 we see that the minimax threshold Tp{^) coincides asymp- 
totically with the one for strong Ip balls. The corresponding risk is larger by a factor 
2/(2 — p) reflecting the larger set of possible distributions f G J^p{C)- Foi' later use also 
note: 

-J^mj • f21og(^^m)-i)j -(1 + 0(1)), m^O. 

6.2 Minimax MSE in Compressed Sensing under Weak p-th Moments 

We return now to the compressed sensing setup. In the noiseless case we consider sequences 
of instances S = {In,N} = {{xq^\ z^^'' = 0, A^'^'^^)}n,N in -SpiS, ^, 0). The minimax asymp- 
totic mean square error of the LASSO is then given by considering the worst case sequence 
of instances 

M^'*(6,n= sup inf AMSE(A;S). (6.13) 



Here asymptotic mean-square error is defined as per Eq. (1.4). 

Analogously, in the noisy case o" > 0, we consider sequences of instances S G <Sp{6, ^, a), 
We then define the minimax risk as 

M^'*{6,^,a)= sup inf AMSE(A;S). (6.14) 
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It turns out that complete analogs of the results of Sections |4] and [5] hold for the weak 
p-ih moment setting. Since the proofs are easy modifications of the ones for strong ip balls, 
we omit them. 

Theorem 6.1 (Noiseless Case, Weak p-th moment). For 6 S (0, 1), ^ > 0, the Minimax 
AMSE of the LASSO over the weak-ip ball of radius ^ is: 



where {Mp-^{6) is the inverse function of the soft thresholding minimax risk, see Eq. |6.i0l). 
Further we have: 

Least Favorable u. The least-favorable distribution v"^'* is the most dispersed distribution 
Up^^ whose distribution function is given by Eq. (6.6), with ^ = 

Minimax Threshold. The minimax threshold A*"'* (5,^) is given by the calibration relation 
{3.4) with T = T^'*{6,(,) determined by: 

r-'*{6,0 = r^mp-HS)), (6.16) 



where Tp{ ■ ) is the soft thresholding minimax threshold, achieving the infimum in Eq. { 6.10). 
Saddlepoint. The pair {X^'* ,iy^'*) satisfies a saddlepoint relation. Put for short AMSE(A; i^) 
AMSE(A; 5, z^, a = 0). The minimax AMSE is given by 

M^'*{6;0 = AMSE(A"''*;z^"''*), 

and 

AMSE(A"''*;z^"''*) < AMSE(A; i/""'*) , VA > (6.17) 

> AMSE(A'"'*;z^), ^ueT^iO- (6.18) 

As an illustration of this theorem, consider again the limit 6 = n/N — )• after N ^ oo 
(equivalently, n/N — )• sufficiently slowly). It follows from Eq. (6.12) that 

m;'*(5,1) = (l - |)''^V-2/P(21og(<5-i))2/P-i{l + 05(1)}, S ^ 0. 



We can also compute the minimax regularization parameter. Lemma 6.4 gives 

A-.(«) = {.(l-|)"".(^i^)'"{l + o(l)}, *^0. (6.19) 

In the noisy case, we get a result in many respects similar to the pth moment result. 

Theorem 6.2 (Noisy Case, Weak p-th moment). For any 6,£, > 0, let m* = mp'*{5,S,) be 
the unique positive solution of 

K f .. , l,..u.) ■ (6-20) 



l + m*/6 P \{l + m*/6)^/^ 
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Then the LASSO minimax mean square error Mp'* is given by: 

M;>*{5,i,a) = cj^m^^*{5,i/a). (6.21) 
Further, denoting by ^* = (1 + m* / 5)^'^^'^^/ a , we have: 

Least Favorable u. The least-favorable distribution v^'* is the most dispersed distribution 
z/p^^ whose distribution function is given by Eq. (6.6). 

MinirricLX Threshold. The minimax threshold A* (5, ^, a) is given by the calibration relation 
{3.4) with T = T*{5,(,,cr) determined as follows: 

T-*(5,C,a) = r;(r). (6.22) 



where Tp{ ■ ) is the soft thresholding minimax threshold, achieving the infimum in Eq. (6.10). 
Saddlepoint. The above quantities obey a saddlepoint relation. Put for short AMSE(A; v) = 
AMSE(A; 5, u, a). The minimax AMSE obeys 

M;''*(5,0 = AMSE(A"''*; z/'"'*) , 

and 

AMSE(A*;z^*) < AMSE(A; i/""'*) , VA > (6.23) 
> AMSE(A™'*;i/), WeS^iO- (6.24) 



7 Traditionally-scaled ^p-norm Constraints 

This paper uses a non-traditional scaUng ||xo||p < N -^p for the radius of ip bahs; traditional 
scahng would be ||xo||p < In this section we discuss the translation between the two 
types of conditions. We first define sequence classes based on norm constraints. 

Definition 7.1. The traditionally-scaled ip problem suite Sp{6,^,0) is the class of 

sequences of problem instances In,N = (xq^\ z^"^ vl^"'^)) where: 

(1) n/N 5; 

(2) ||xq^^||p < and, for some sequence B = {Bm}m>o such that B]\j — )• 0, we have 
EiIi(4?)'l[(l4!I^I > M) < BmN'-^/p for every N; 

(3) z^""^ e R", ||z(")||2 ~ (J ■ . AT-i/p^ (n,Ar) ^ oo. 

(4) ^("'^) ~ GAUSS(n,iV). 

The traditionally-scaled weak £p problem suite Sp{6,S,,0) is defined using condi- 
tions (1),(3),(4) and 

(2^^) ||2;o^^||^£ < and, , for some sequence B = {Bm^ m>o such that Bm 0, we have 

E£i(4?)'I(|4!I^I >M)< BmN^-^/p for every N; 

Comparing our earlier definitions of standard ^^-constrained problem suites 5p((5, ^, a) 
and Sp{5,^,a) with these new definitions, conditions (1) and (4) are identical; while the 
new (2) and (3) are simply rescaled versions of corresponding conditions (2) and (3) in the 
earlier standard problem suites]^ To deal with such rescaling, we need the following scale 
covariance property: 

*Note the awkwardness of the noise scahng in the traditional scahng, as compared to the standard scahng 
used here 
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Lemma 7.1. Let I = (xq^^ 2*^"^ A^"'^)) be a problem instance and la = {a ■ XQ^\a ■ 
2;('*) J ^("'^) ) be the corresponding dilated problem instance. Suppose that x^^^ is the unique 
LASSO solution generated by instance I andx^^^'"' the unique solution generated by instance 
la ■ Then 



and 



_ „t(^)||2 - „2 . ii^(iV) _ (iV)||2 



MEWxi""^'^ - ax^^^Wl = a' • inf i?||xf ^ - x^^^g. 

A A 



Applying this lemma yields the following problem equivalences: 

Corollary 7.1. We have the scaling relations: 

sup inf AMSE(A, S) = A^-^/p . gup inf AMSE(A, S); 
seMsm ^ se5p(5,e,o) ^ 

and 

sup inf AMSE(A,S) = Ar-2/P. sup inf AMSE(A, S). 

Se5-(5,{,0) ^ Se5» (5,5,0) ^ 

Let's apply this to noiseless ip ball constraint. By Theorem 4.1 we have 

min max AMSE(A, S) = f 

A SeS(S,m Mp\sy 

Considering the unnormalized squared error ||i?A^3;o|P and operating purely formally, define 
a symbol E so that when Xq^^ arises from a given sequence S, 

^Pf ) - ) f = AT . AMSE(A, S). 

Remembering 6 = n/N we have 

min max E\\xf^-x^o^^f = N ■ {n/N)^-^/P ■ ■ {2\og{N/n))^/''-^ {l + on{1)} ■ 

A S65p(i5,5,0) 

2/p-l 



Using the traditionally-scaled ip problem suite, 

min max E\\xx — xqW"^ = N~'^^p ■ min max — a;o||^, 

se<Sp (5,5,0) A se5p (5,5,0) 

where on the LHS we have Sp{S, ^, 0) while on the RHS we have Sp{6, 0). We conclude 

Corollary 7.2. Consider the noiseless, traditionally-scaled tp problem formulation. The 
asymptotic MSE for the ^ -norm error measure has the asymptotic form 

min max E\\xx - xof = f ' C^^^^^^Y" '{^ + ONil)}; (7.1) 
A Se5p (5,5,0) \ n J ^ 
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this is valid both for n/N — )• 5 G (0, 1) and for 6 = n/N — J- slowly enough. The maximin 
penalization has an elegant prescription when n/N — ?• slowly enough: 

A-=f.(^MiW)*, S.4(«.0). (7.2) 

Our results can now be compared with earlier results written in the traditional scaling. 
We rewrite our result for = 1, using a simple moment condition that implies uniform 
integrability. For all sufficiently large B, and all q > 2, we obtained: 

min max ^ - x^^f = f ^^^V^'"' {l + (7.3) 

In the case A = 0, earlier results |Don06al ICT05j imply: 

max ||xo-xof = f ( MA^/n) x^/^-^\ q_ 

There are two main differences in technical content between the new result and earlier ones 



The use of E on the LHS of (7.3) versus Op{ ■ ) on the RHS of (7.4). 



The supremum over {||xq^''||p < 1} on the LHS of (7.4) versus the supremum over 
{Ikr^llp < 1} n {||xo||^ < BN^-'i/P} on the LHS of (S. 



The main difference in results is of course that the new result gives a precise constant in 
place of the 0{ ■ ) result which was previously known. See Section 10.3 for further discussion. 

The new result has the additional ingredient, not seen earlier, that we constrain not only 
{Iko^^llp ^ 1} but also {||xq^^||2 < BN^^'^^P}. For each p < 2, this additional constraint 
does indeed give a smaller set of feasible vectors for large N. See Section 10.2 for further 
discussion. 

A traditionally-scaled weak-.£p problem suite 5^(5, ^, a) can also be defined; without 
giving details, we have: 

Corollary 7.3. Consider the noiseless, traditionally-scaled weak-ip problem formulation. 
The asymptotic MSB for the I'^-norm error measure has the asymptotic form 

min max - = {I - p/2)-^''^ ■ e ■ i'^-^^^^^^''' \l + on{1)]-, (7.5) 

^ Se5-(5,C,0) V / 

this is valid both for n/N — )• 5 G (0, 1) and for 6 = n/N — )• slowly enough. The maximin 
penalization has an elegant prescription for n/N small: 

A- = (l-p/2r'".?-(^i;5^^)'", SeS;(«,0). (7.6) 
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8 Compressed Sensing over the Bump Algebra 



Our discussion involving £p-balls is so far rather abstract. We consider here a styhzed 
appHcation: recovering a signal / in the Bump Algebra from compressed measurements. 
Consider a function / : [0, 1] — )• M which admits the representation 

oo 

f{t) = Y,Cig{{t-U)/ai) , g{x)=exp{-xy2), a, > 0. (8.1) 

i=l 

Each term g{-) is a Gaussian 'bump' normalized to height 1, and we assume Yl'i^i < 1 
which ensures convergence of the series. The Ci are signed amplitudes of the 'bumps' in 
/. We refer to the book by Yves Meyer |Mey84| and also to the discussion in |D J98] . 
which calls such objects models of polarized spectra. Any such function also has a wavelet 
representation 

j>-i keXj 

where the 'i/'j,^ are smooth orthonormal wavelets (for example Meyer wavelets or Daubechies 
wavelets), and the wavelet coefficients obey ^ \ (^j,k\ < C. The constant C depends only 
on the wavelet basis |Mey84| . Here j denotes the level index, and k the position index. We 
have = 1, and \Ij\ = 2^ for each j > 0. In other words the collection of functions 

with wavelet coefficients in an ^i-ball of radius C contains the whole algebra of functions 



represent able as in (8.1). 

Now consider compressed sensing of such an object. We fix a maximum resolution, by 
picking N = 2"^ and considering the finite-dimensional problem of recovering the object 
/at = X]j<jX]fcl=o^ Oij,k'^j,k- scale 2~^ corresponds to an effective discretization scale: 
on intervals of length much smaller than 2~'^, the function is approximately constant. 
Reconstructing the function /tv is equivalent to recovering the 2^ coefficients 

xq = {a-1,0, OLo.o, Qi,o,ai,i> Q=2,o, ■ • ■ , a2,3, "3,1 > • ■ • > "J-i,o, • • • , Qj_i,2J-i-i) • 

We know that coefficients at scales 1 through J — 1 combined have a total £i-norm bounded 
by a numerical constant C. Without loss of generality, we shall take C = 1 (this corresponds 



to rescaling the constraint on the bump representation (8.1)). 



Denote by Vj the 2'^-dimensional space of functions on [0, 1], with resolution 2 , i.e. 

2^-1 

^J^{Y.Y. ""j^ki^j^k ■■ aj^k G m} . (8.2) 

j<J k=0 

We can construct a random linear measurement operator A : Vj ^ M" , such that the matrix 
A representing A in the basis of wavelets has random Gaussian coefficients iid N(0, 1). We 
then take n+1 noiseless measurements: the scalar a_i,o = (/, fp-ifi) associated to the 'father 
wavelet', and the vector y = AJn- Notice that, since the measurements are noiseless, the 
variance of the entries of the measurement matrix A can be rescaled arbitrarily. 

In the wavelet basis, the measurements can be rewritten as y = Axq, where the A is an 
n X N Gaussian random matrix. This is precisely a problem of the type studied in earlier 
sections. Suppose now that we apply £i -penalized least-squares 



Xx = argmm 

X 



^||y-ylx||i + A||x||i}, (8.3) 
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and denote the entries of the reconstruction vector by x\ = (3o,0) ■ ■ ■ ■, otj-i 2"'-i-i)- The 
function /at is therefore reconstructed as /tv, where 

2^-1 
jr<J A,-0 

We adopt the performance measure 

MSE(/jv, fN) = IE{||/iv - /a^IIL[o,i]} = Elko - xaIII; 

where the last equahty uses the orthonormahty of the wavelet basis. 

We wish to choose an appropriate value of A > to give the best reconstruction perfor- 
mance. Note that the coefficients vector xq € satisfies by assumption 

Ikolli < 1 

so we are in the setting of traditionally-scaled balls. The discussion of the last section now 
applies; we obtain results by rescaling results from Theorem 4.1. Letting A* ((5,^) denote 
the minimax threshold of Theorem 4.1, define 

A^ = iV-i.A^(^,l). (8.4) 

Corollary 8.1. Consider a sequence of functions fjsj G Vj in the Bump Algebra (normed so 
that the wavelet coefficients have ii-norm bounded by 1). Consider Gaussian measurement 
operators An ■ Vj — )• indexed by the problem dimensions N = 2"^ , and n. Let f^ denote 
the reconstruction of fjy using regularization parameter A = \n of (8.4)- 
(i) Assume n/N — t- J G (0, 1). Then we have 

MSE(/^, /jv) < N-^ ■ M*{5, 1) (1 + 0(1)) , (8.5) 

with Ml(6,^) as in Theorem 4-1- This bound is asymptotically tight (achieved for a specific 
sequence fj\f ). 

(a) Assume n/N — t- sufficiently slowly. Then we have 

MSE(/iV, fN) < . (1 + oil)), , (8.6) 

n 

and the bound is asymptotically tight (achieved for a specific sequence fN)- 



9 Compressed Sensing over Bounded Variation Classes 

Compressed sensing problems make sense for many other functional classes. The class of 
Bounded Variation affords an application of our results on weak ip classes. 

1. Every bounded variation function / G ^^[0, 1] has Haar wavelet coefficients in a 
weak-£2/5 ball. 

2. Every / G BV[0, 1]^ has wavelet coefficients in a weak-£i bah |CDPX99) . 
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We can develop a theory of compressed sensing over BV spaces following the previous 
section, now using Haar wavelets. Vj means again the span wavelets of spatial scale 2~'^ or 
coarser. We let d denote the spatial dimension (d = 1 or 2 in the above examples). We use 
regularization parameter 

Xn = N-^ ■ X-^'^n/N,!). (9.1) 

Corollary 9.1. Consider a sequence of functions /at G Vj whose Haar wavelet coefficients 
have weak ip-norm bounded by 1. Consider Gaussian measurement operators An ■ Vj — t- M" 
indexed by the problem dimensions N = 2'^'^ , and n. Let f^ denote the reconstruction of 



fN using regularization parameter X = X^ of (9.1) 



(i) Assume n/N — )• 5 G (0) !)• Then we have 

MSE(7^, /jv) < N~' ■ Mp»-*((^, 1) (1 + 0(1)) , (9.2) 

with Ml'^{5, ^) as in Theorem 6.1. This bound is asymptotically tight (achieved for a specific 
sequence /at ). 

(a) Assume n/N — t- sufficiently slowly. Then we have 

MSEM.M < (1 - 1)-"'. l^^Mmf'-' . + . ,0,3, 

and the bound is asymptotically tight (achieved for a specific sequence fN)- 

Although BV offers only the applications p = 1 {d = 2) and p = 2/5 [d = 1), weak-^p 
spaces arise elsewhere, and serve as useful models for image content. For example, for images 
containing smooth edges, we have the following model: every / : [0, 1]2 M which is locally 
in except at 'edges' has curvelet coefficients levelwise in weak-^2/3 balls |CD04] . Our 
compressed sensing result for BV can be adapted without change to the conclusions for such 
a setting, after replacing the role of Haar wavelets by Curvelets. 

10 Discussion 

In this last section we discuss some specific aspects of our results and overview (in an 
unavoidably incomplete way) the related literature. 

10.1 Equivalence of Random and Deterministic Signals/Noises 

A striking aspect of our results is the equivalence of random and deterministic signals 
and noises (traceable here to Proposition 3.1). The AMSE formula in the general case. 



as given by Eq. (3.5), depends on the sequence of signals Xq^'' and of noise vectors z^"^ 
only through simple statistics of such vectors. More precisely, it depends only on their 
asymptotic empirical distributions, respectively u and uj. In fact the dependence on z*^"^ is 
even weaker: the asymptotic risk only depends on the limit second moment Eij(Z'^). 

At first sight, these findings are somewhat surprising. For instance we might replace Xq^^ 
with a random vector with i.i.d. entries with common distribution v without changing the 
asymptotic risk. This asymptotic equivalence between random and deterministic signal is in 
fact a quite simple and robust consequence of the absence of structure of the measurement 
matrix A. We do not spell out the details here, but note the following simple facts 
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1. Under our model for A, the columns of A are exchangeable, so there is no distributional 
difference between Axq and APxq, for any permutation matrix P. 



2. As a consequence, there is no difference in expected performance between a fixed 
vector xo and a random vector obtained by permuting the entries of xq uniformly at 



3. Asymptotically for large N there is a negligible difference in performance between a 
fixed vector xq and the typical random vector obtained by sampling with replacement 
from the entries of xq- 

This argument implies that we can replace the deterministic vectors Xq^^ with random 
vectors with i.i.d. entries. As the argument clarifies, this phenomenon ought to exist for 
more general models of A. 

10.2 Comparison with Previous Approaches 

Much of the analysis of compressed sensing reconstruction methods has relied so far on 
a kind of qualitative analysis. A typical approach has been to frame the analysis in 
terms of 'worst case' conditions on the measurement matrix A. A useful set of condi- 
tionsis provided by the restricted isometry property (RIP), |CT05| [ORT06j and refinements 
|BRT091 IvdGBOQl |BGI"'"08] . These conditions are typically pessimistic, in that they assume 
that the signal xq is chosen adversarially, but they capture the correct scaling behavior. 

The advantage of this approach is its broad applicability; since one assumes little about 
the matrix A, the derived bound will perhaps apply to a wide range of matrices. However, 
there are two limitations: 

(a) These conditions have been proved to hold with linear scaling of ||xo|| and n with the 
signal dimension N, only for specific random ensembles of measurement matrices, e.g. 
random matrices with i.i.d. subexponential entries. 

(b) The resulting bounds typically only hold up to unspecified numerical constants. Ef- 
forts to make precise the implied constants in specific cases (see for instance [BCTll] ) 
show that this approach imposes restrictive conditions on the signal sparsity. For 
instance, for a Gaussian measurement matrix with undersampling ratio 6 = 0.1, RIP 
implies successful reconstruction [BCTll] only if ||xo||o ^ 0.0002 A^. In empirical 
studies, a much larger support appears to be tolerated. 

The present paper works with only one matrix ensemble - Gaussian random matrices - 
but gets quantitatively precise results, like the companion works [DM M091 IDMMlOl IBMIO] . 
The approach provides sharp performance guarantees under suitable probabilistic models 
for the measurement process. 

To be concrete, consider the case of xq belonging to the weak-^p ball of radius 1, 
ll^^olU^p < 1- Building on the RIP theory, the review paper |Can06) derives the bound 



random. 
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holding for Gaussian measurement matrices A, and for unspecified constant C. Analogous 
minimax bounds for £p balls are known |Don06a[ IRWY09] . Our results have the sa me fo rm, 



but with specific constants, e.g. C = (1 — (p/2)) for weak-ip balls, cf. Eq. (7.5) and 
C = 1 for ordinary £p balls, cf. Eq. (7.1). Moreover, these constants are sharp, i.e. attained 



by specifically described xq. 

Let us finally mention the recent paper |CP10| . that takes a probabilistic point of view 
similar to the one of [DMMIO] and to the present one, although using different techniques. 
This approach avoids using RIP or similar conditions, and applies to a broad family of 
matrices with i.i.d. rows. On the other hand, it only allows to prove upper bounds on MSE 
off by logarithmic factors. 

10.3 Comparison to the theory of widths 

Recall that the Gel'fand n- width of of a set K C with respect to the norm || • ||x is 
defined as 

dn{K,X)= inf sup , (10.2) 

AeM"x^^exnker{A) 

where ker(A) = {w E : Av = 0}. Here we shall consider K to be the ip ball of radius 1, 
Bp = {x G : ||x||p < 1}, and fix || • ||x to be the ordinary ^2 norm. A series of works 
|Kas77l IGG841 IDonOGaL IFPRUlO) established that 

rf„«,^.)>c,(i^^) (10.3) 

as long as the term in parenthesis is smaller than 1. 

The interest for us lies in the well-known observation |Don06aj that dn{Bp , £2) provides 
a lower bound on the compressed sensing mean square error under arbitrary reconstruction 
algorithm, and for arbitrary measurement matrix A. In particular 



N 
P 

V 



max p-xoll > (i„(-B^,^2)- (10.4) 



So it makes sense to define the inefficiency of a certain matrix/reconstruction procedure as 
the ratio of the two sides in the above inequality 

raig(S^,^2) = , „ X max ||x - xo|| . (10.5) 

This ratio implicitly depends on the matrix A. In the case p = 1, A = it is known that £^ 
minimization is inefficient at most by a factor 2: 

r^iniABp,i2)<2; (10.6) 

(for example |Don06a| showed this by invoking |TW80] ) . 

Our work concerns random Gaussian matrices and LASSO reconstruction. Since the 
worst-case performance of the optimally-tuned LASSO can not be worse than the worst- 
case performance of min-^^ reconstruction, and since we have a formal expression for the 
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worst-case AMSE of optimally-tuned LASSO, the asymptotic formula (7.1) together with 
the bound (10.3) implies for all sufficiently large B and any q > 2 that 



^Ellx^) - xoll = .^MEM^i + ,(1)), (10.7) 

with E defined in analogy with Section [7| The constant B is arbitrary, which suggests (but 
of course does not prove) that we can remove the hypothesis ||a;o||g ^ BN^^'^^^^p completely. 
On the other hand, for a fixed matrix A G M"^^, we can define the width 

dniK,A,X) = sup , 

a::eXnker(A) 

SO that the Gel'fand n-width is the infimum of this quantity over A. Using results of Donoho 
and Tanner jPTlOj one can give the lower bound for p = 1 and Gaussian random matrices 



<!„«,_4,^.)>y'^(l + o(l)). 



The right hand side of Eq. (10.7) is quantitatively quite close to the right-hand side of 
the last display. Hence the results of this paper suggest that statistical methods may also 
provide geometric information. 

In the general case < p < 1, lower bounds on Cp are given in [FPRUIO] . but they do 
not appear as tight as desirable. 



10.4 About the Uniform Integrability Condition 

We have just seen once again that our hypotheses on ip balls can be scaled to match 
||a^o||p < 1 but then they also include the hypothesis ||xo||g < BN^^'^'^^p . It may seem 
at first glance that this is a serious additional constraint; it implies that the entries in xq 
cannot be very large as increases, whereas the condition ||xo||p < 1 of course permits 
entries as large as 1. 

However, note that our analysis identifies the least- favorable xq, and that the constant 
B plays no role. In fact, if we make a homotopy between the least-favorable object and 
objects requiring larger B, we find that the AMSE is decreasing in the direction of larger B. 
Pushing things to the extreme where B goes unbounded, of course our analysis techniques 
no longer rigorously apply, but it is quite clear that this is an unpromising direction to 
move. Hence we believe that this is largely a technical condition, caused by our method of 
proof. 
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